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Abstract 

Background: With the increasing availability of high dimensional time course data for metabolites, genes, and fluxes, 
the mathematical description of dynamical systems has become an essential aspect of research in systems biology. 
Models are often encoded in formats such as SBML, whose structure is very complex and difficult to evaluate due to 
many special cases. 

Results: This article describes an efficient algorithm to solve SBML models that are interpreted in terms of ordinary 
differential equations. We begin our consideration with a formal representation of the mathematical form of the 
models and explain all parts of the algorithm in detail, including several preprocessing steps. We provide a flexible 
reference implementation as part of the Systems Biology Simulation Core Library, a community-driven project 
providing a large collection of numerical solvers and a sophisticated interface hierarchy for the definition of custom 
differential equation systems. To demonstrate the capabilities of the new algorithm, it has been tested with the entire 
SBML Test Suite and all models of BioModels Database. 

Conclusions: The formal description of the mathematics behind the SBML format facilitates the implementation of 
the algorithm within specifically tailored programs. The reference implementation can be used as a simulation 
backend for Java™-based programs. Source code, binaries, and documentation can be freely obtained under the 
terms of the LGPL version 3 from http://simulation-core.sourceforge.net. Feature requests, bug reports, contributions, 
or any further discussion can be directed to the mailing list simulation-core-development@lists.sourceforge.net. 

Keywords: Systems biology, Biological networks, Mathematical modeling, Simulation, Algorithms, Ordinary 
differential equation systems, Numerical integration, Software engineering 



Background 

As part of the movement towards quantitative biology, 
the modeling, simulation, and computer analysis of bio- 
logical networks have become integral parts of modern 
biological research [1]. Ambitious national and interna- 
tional research projects such as the Virtual Liver Network 
[2] strive to derive even organ-wide models of biolog- 
ical systems that include all kinds of processes taking 
place at several levels of detail Large-scale efforts like this 
require intensive collaboration between various research 
groups, including experimenters, modelers, and bioinfor- 
maticians. The exchange, storage, interoperability, and the 
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possibility to combine models have been recognized as 
key aspects of this endeavor [3-6]. 

XML-based standard description formats such as the 
Systems Biology Markup Language (SBML) [7,8] and 
CellML [9,10] enable encoding of quantitative biological 
network models. To facilitate sharing and re-use of the 
models, online databases such as BioModels Database [11] 
and the CellML model repository [12] provide large col- 
lections of published models. Software libraries for read- 
ing and manipulating the content of these formats are also 
available [13-15] as well as end-user programs supporting 
these model description languages. 

The models encoded in these formats can be inter- 
preted in terms of several modeling frameworks, includ- 
ing, but not limited to, differential equation systems, with 
additional structures such as discrete events and alge- 
braic equations. The diversity of modeling approaches 
and experimental data often requires customized software 
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solutions for very specific tasks. For efficient analysis, sim- 
ulation, and calibration (e.g., the estimation of parameter 
values) of biological network models a multiple-purpose 
and efficient numerical solver library is prerequisite. 
Although the language specifications of SBML [16-22] 
and CellML [23] describe the semantics of models in 
these formats and their interpretation, the algorithmic 
implementation is still not straightforward. 

The SBML community offers standardized and manu- 
ally derived benchmark tests [24] in order to evaluate the 
quality of simulation results, because it has been recog- 
nized that in many cases different solver implementations 
lead to divergent results [25]. The availability of this test 
suite and the currently much larger variety of support- 
ing software for SBML a in comparison to CellML are the 
reasons that in this work we focus on the simulation of 
models encoded in the SBML format. 

We address the question of how to precisely interpret 
these models in terms of ordinary differential equation 
systems. Furthermore, we show how to adapt existing 
numerical integration routines in order to simulate these 
models. To this end, we derive a new algorithm for the 
accurate interpretation and simulation of all currently 
existing levels and versions of SBML. To demonstrate 
the usefulness of the algorithm, we introduce an exhaus- 
tive reference implementation in Java™. The algorithm 
described in this paper is, however, not limited to any 
particular programming language. 

It is also important to note that the interpretation of 
these models must be strictly separated from the numer- 
ical method that solves the implied differential equation 
system. In this way, a similar approach would also be pos- 
sible for other systems biology community formats. In 
particular, the architecture of the reference implementa- 
tion described herein has been ab ovo designed with the 
aim to be complemented by a CellML module. 

As the result, we present the Systems Biology Simulation 
Core Library, a platform-independent, well-tested generic 
open-source library. The library is completely decoupled 
from any graphical user interface and can therefore eas- 
ily be integrated into third-party programs. It comprises 
several ordinary differential equation (ODE) solvers and 
an interpreter for SBML models. It is the first simulation 
library based on JSBML [15]. 

Furthermore, the Systems Biology Simulation Core 
Library contains classes to both export simulation con- 
figurations to the Simulation Experiment Description 
Markup Language (SED-ML) [26], and facilitate the re- 
use and reproduction of these experiments by executing 
SED-ML files. 

Results and discussion 

In order to derive an algorithm for the interpretation of 
SBML models in a differential equation framework, it is 



first necessary to take a closer look at the mathematical 
equations implied by this data format. Based on this 
general description, we will then discuss all necessary 
steps to deduce an algorithm that takes all special cases 
for the various levels and versions of SBML into account. 

A formal representation of models In systems biology 

The mathematical structure of a reaction network com- 
prises a stoichiometric matrix N, whose rows correspond 
to the reacting species S within the system, whereas its 
columns represent the reactions, i.e., bio-transformations, 
in which these species participate. The velocities v of the 
reaction channels R determine the rate of change of the 
species' amounts: 

d - 

— S = Nv(S,t,N,W,p). (1) 
at 

The parameter vector p contains rate constants and other 
quantities that influence the reactions' velocities. Accord- 
ing to Liebermeister et al. [27,28] the modulation matrix 
W is defined as a matrix of size \R\ x \S\ containing a 
numerical representation of the type of the regulatory 
influences of the species on the reactions, e.g., competitive 
inhibition or physical stimulation. Integrating the homo- 
geneous ordinary differential equation system (1) yields 
the predicted amounts of the species at each time point of 
interest within the interval [ to, tj\\ 

S= f r Nv(5,£,N,W,#)d/:, (2) 
J to 

where to e R and to < tj. Depending on the units of 
the species, the same notation can also express the change 
of the species' concentrations. In this simple case, solving 
equation (2) can be done in a straightforward way using 
many (numerical) differential equation solvers. The non- 
linear form of the kinetic equations in the vector function 
v constitutes the major difficulty for this endeavor and is 
often the reason why an analytical solution of these sys- 
tems is not possible or very hard to achieve. Generally, 
differential equation systems describing biological net- 
works are, however, inhomogeneous systems with a higher 
complexity. Solving systems encoded in SBML can be seen 
as computing the solution of the following equation: 

Q = [ r Nv(Q^,N,W,^)+|(Q,Od^+/£(Q,0+r(Q,0 ) 
J to 

(3) 

with to = 0 and tj e R+. The vector Q of quanti- 
ties contains the sizes of the compartments C, amounts 
(or concentrations) of reacting species S, and the values 
of all global model parameters P. It should be noted that 
these models may contain local parameters p that influ- 
ence the reactions' velocities, but which are not part of the 
global parameter vector P, and hence also not part of Q. 
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All vector function terms may involve a delay function, 
i.e., an expression of the form delay (x, r) with r > 0. It is 
therefore possible to address values of x computed in the 
earlier integration step at time t — r, turning equation (3) 
into a delay differential equation (DDE). Note that x can 
be an arbitrarily complex expression. 

In the general case of equation (3), not all species' 
amounts can be computed by integrating the transfor- 
mation Nv: the change of some model quantities may 
be given in the form of rate rules by function g(Q, t). 
Species whose amounts are determined by rate rules must 
not participate in any reaction and hence only have zero- 
valued corresponding entries in the stoichiometric matrix 
N. Thereby, the rate rule function g(Q, t) directly gives the 
rate of change of these quantities, and returns 0 for all 
others. 

In addition, SBML introduces the concept of events 
/e(Q> t) and assignment rules r(Q, t). An event can directly 
manipulate the value of several quantities, for instance, 
reduce the size of a compartment to a certain portion of its 
current size, as soon as a trigger condition becomes satis- 
fied. An assignment rule also influences the absolute value 
of a subject quantity. 

A further concept in SBML is that of algebraic rules, 
which are equations that must evaluate to zero at all times 
during the simulation of the model. These rules can be 
solved to determine the values of quantities whose values 
are not determined by any other construct. In this way, 
conservation relations or other complex interrelations can 
be expressed in a very convenient way. With the help of 
bipartite matching [29] and a subsequent conversion it is 
possible to turn algebraic rules into assignment rules and 
hence include these into the term r(Q, t). Such a transfor- 
mation, however, requires symbolic computation and is 
thus a complicated endeavor. 

When the system under study operates at multiple time 
scales, i.e., it contains a fast and a slow subsystem, a sep- 
aration of the system is necessary, leading to differential 
algebraic equations (DAEs). Some species can be declared 
to operate at the systems boundaries, assuming a constant 
pool of their amounts or concentrations. Care must also 
be taken with respect to the units of the species, because 
under certain conditions division or multiplication with 
the sizes of their surrounding compartments becomes 
necessary in order to ensure the consistent interpretation 
of the models. For all these reasons, solving equation (3) 
is much more complicated than computing the solution of 
the simple equation (2) alone. 

From the perspective of software engineering, a strict 
separation of the interpretation of the model and the 
numerical treatment of the differential equation system 
is necessary to ensure that regular numerical methods 
can be used to solve equation (3). In order to efficiently 
compute this solution, multiple preprocessing steps are 



required, such as the conversion of algebraic rules into 
assignment rules, or avoiding repeated recomputation of 
intermediate results. The next sections will give a detailed 
explanation of the necessary steps to solve these systems 
and how to efficiently perform their numerical integration 
with standard numerical solvers. 

Initialization 

At the beginning of the simulation the values of species, 
parameters and compartments are set to the initial values 
given in the model. All rate laws of the reactions, assign- 
ment rules, transformed algebraic rules (see below), initial 
assignments, event assignments, rate rules and function 
definitions are integrated into a single directed acyclic 
syntax graph. This graph is thus the result of merging 
the abstract syntax trees representing all those individ- 
ual elements. Equivalent elements are only contained 
once. In comparison to maintaining multiple syntax trees, 
this solution significantly decreases the computation time 
needed for the evaluation of syntax graphs during the 
simulation. Figure 1 gives an example for such a syntax 
graph. 

After the creation of this graph, the initial assignments 
and the assignment rules (including transformed algebraic 
rules) are processed and initial values defined by these 
constructs are computed. 



9 ? 




Figure 1 Example for the creation of an abstract syntax graph of 
a small model. This figure displays a unified representation of kinetic 
equations from an example model that consists of the following 
reactions: R ] : F1,6BP ^ DHAP + GA3P, R 2 : DHAP ^ GA3P. Both 
reactions are part of the glycolysis. The contained molecules are 
fructose 1,6-bisphosphate (F1,6BP), dihydroxyacetone phosphate 
(DHAP), and glyceraldehyde 3-phosphate (GA3P). Using the program 
SBMLsqueezer [31] the following mass action kinetics have been 
created: = k +] .[F1,6BP] -/c_i -[DHAP] -[GA3P] , vr 2 = k +2 - 
[DHAP] -/c_ 2 -[GA3P].The nodes for [DHAP] and [GA3P] are only 
contained in the syntax graph once and connected to more than one 
multiplication node. This figure clearly indicates that the syntax graph 
is not a tree. As can be seen in this picture, the outdegree of syntax 
trees does not have to be binary. 
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Solving algebraic rules 

The most straightforward approach to deal with algebraic 
rules is to convert them to assignment rules, which can 
in turn be directly solved. In every equation of an alge- 
braic rule, there should be at least one variable whose 
value is not yet defined through other equations in the 
model. This variable has to be determined for the pur- 
pose of interpreting the algebraic rule. At first, a bipartite 
graph is generated according to the SBML specifications 
[19-22]. This graph is used to compute a matching using 
the algorithm by Hopcroft and Karp [29]. The initial 
greedy matching is extended with the use of augmenting 
paths. This process is repeated until no more augment- 
ing paths can be found. Per definition, this results in a 
maximal matching. As stated in the SBML specifications 
[19-22], if any equation vertex remains unconnected after 
augmenting the matching as far as possible, the model is 
considered overdetermined and thus is not a valid SBML 
model. If this is not the case, the mathematical expression 
of every algebraic rule is thereafter transformed into an 
equation with the target variable on its left-hand side, and 
hence fulfills the definition of an assignment rule. The left- 
hand side is represented by the respective variable vertex, 
to which the considered algebraic rule has been matched. 
Figure 2 displays the described algorithm in the form of a 
flow chart. 

Event handling 

An event in SBML is a list of assignments that is exe- 
cuted depending on whether a trigger condition switches 
from false to true. In addition, SBML enables modellers 
to define a delay which may postpone the actual execu- 
tion of the events assignments to a later point in time. 
With the release of SBML Level 3 Version 1, the process- 
ing of events has been raised to an even higher level of 
complexity: in earlier versions it was sufficient to deter- 
mine, when an event triggers and when its assignments 
are to be executed. In Level 3 Version 1 only a few new 
language elements have been added, but these have a sig- 
nificant impact on how to handle events: for example, the 
order, in which events have been processed, used to be at 
programmer's discretion in SBML Level 2, but in Level 3 
Version 1 it is given by the event s priority element. Coor- 
dinating the sequence, in which events are to be executed, 
has now become the crucial part of event handling. Fur- 
thermore, there exists the option to cancel an event during 
the time since its trigger has been activated and the actual 
time when the scheduler picks the event for execution. 
Events that can be cancelled after the activation of their 
triggers are called nonpersistent. 

At every time step, the events to be executed are a union 
of two subsets of the set of all events. On one hand, there 
are events whose triggers have been activated at the cur- 
rent time and which are to be evaluated without delay. 



On the other hand, there are events triggered at some 
time point before, and whose delay reaches till the cur- 
rent point in time. For every element of the resulting 
set of events , the priority rule must be evaluated. One 
event is randomly chosen for execution from all events 
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Figure 2 Algorithm for transforming algebraic rules to 
assignment rules. The first step is to decide whether the model is 
overdetermined by creating a matching between the equations and 
the variables of a model. For this purpose, an initial greedy matching 
is computed based on a bipartite constructed according to the SBML 
specifications. To obtain a maximal matching, augmenting paths are 
determined and the current matching is extended. If there are no 
augmenting paths available anymore, the computed matching is 
maximal. Having an unconnected equation vertex results in an 
overdetermined model. If the matching is not overdetermined, for 
each algebraic rule an assignment rule is generated. The left-hand 
side of each rule corresponds to the variable the respective algebraic 
rule has been matched to. 
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of highest priority. In principle, all other events could be 
processed in the same manner, but the assignment of the 
first event can change the priority or even the trigger 
condition of the events that have not yet been executed. 
Therefore, the trigger of nonpersistent events and the pri- 
ority of the remaining events have to be evaluated again. 
In this case, the event that has now the highest priority 
is chosen as next. This process must be repeated until 
no further events are left for execution. Figure 3 shows 
the slightly simplified algorithm for event processing at a 
specific point in time: Let E be the set of all events in a 
model, and E\ be the set of events whose trigger condi- 
tions have already been evaluated to true in previous time 
steps. We refer to elements within E\ as inactive events. 



We define the set E^ as the subset of E containing events 
whose trigger condition switches from false to true at the 
current time step t. At the beginning of the event han- 
dling, Ea is empty. We call an event persistent, if it can 
only be removed from £a under the condition that all of 
its assignments have been evaluated. This means that a 
nonpersistent event can be removed from E\ when its trig- 
ger condition becomes false during the evaluation of other 
events. The function trig(e) returns 1 or 0 depending on 
whether or not the trigger condition of event e e E is sat- 
isfied. Similarly, the function persist(e) returns 1 if event e 
is persistent, or 0 otherwise. 

The interpretation of events is the most time consuming 
step of the integration procedure. This is why efficient and 




E A :=(D 

E l := {e | trig(e) = 1} 



E A := E A \ 
{e | trig(e) = 0 A 
persist(e) = 0} 




(Stop - 

V, Yes 

Figure 3 Processing of events: simplified algorithm (handling of delayed events omitted). At each iteration, the trigger conditions of active 
events e a e /Fa that are not persistent are checked. If the trigger condition of such an event has changed from true (1) to false (0), the event is 
removed from /Fa- The next step comprises the evaluation of the triggers of all events. If its trigger changes from false to true, an event is added to 
the set of active events /Fa. An event with its trigger changed from true to false is removed from the list of inactive events. After the processing of all 
triggers, the event e of highest priority in the set of active events is chosen for execution by the function choose(^A)- Note that priorities are not 
always defined, or multiple events may have an identical priority. The function choose(^A) is therefore more complex than shown in this figure. The 
selected event is then processed, i.e., all of its assignments are evaluated, and afterwards the triggers of all events in E have to be evaluated again, 
because of possible mutual influences between the events. The algorithm proceeds until the set /Fa of active events is empty. 
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clearly organized data structures are required to ensure 
high performance of the algorithm. 

Time step adaptation considering events and the 
calculation of derivatives 

The precise calculation of the time when events are 
triggered is crucial to ensure exact results of the numerical 
integration process. It could, for instance, happen that an 
event is triggered at time t T , which is between the inte- 
gration time points t T -\ and £ T+ i. When processing the 
events only at time points t T -\ and £ r+ i, it might happen 
that the trigger condition cannot be evaluated to true at 
neither of these time points. Hence, a numerical integra- 
tion method with step-size adaptation is required in order 
to hit the correct time points. Rosenbrocks method [30] 
can adapt its step size h if events occur (see Figure 4 for 
details). For a certain time interval [tr-i^r+i] an d the 
current vector Q, Rosenbrocks method determines the 



new value of vector Q at a point in time t x -\ + /z, with 
h > 0. If the error tolerance cannot be respected, h is 
reduced and the procedure is repeated. 

After that, the events and the assignment rules are 
processed at the new point in time t T -\ + h. If the 
previous step causes a change in Q, the adaptive step size 
is decreased by setting h to V 10 an d the calculation is 
repeated until either the minimum step size is reached or 
the processing of events and assignment rules does not 
change Q anymore. Hence, the time at which an event 
takes place is precisely determined. 

For given values Q at a point t in time the current vector 

of derivatives Q is calculated as follows. First, the rate rules 
are processed Q = g(Q, t). Note that function^ returns 0 
in all dimensions in which no rate rule is defined. Second, 
the velocity V{ of each reaction channel R{ is computed 
with the help of the unified syntax graph (e.g., Figure 1). 
The velocity functions depend on Q at time t. During the 




t := t n 




Q:=Q 




h := adapt (h) 




Figure 4 Refined step-size adaptation for events. For a certain time interval, the Rosenbrock solver (KiSAO term 33) always tries to increase time t 
by the current adaptive step size h and calculates a new vector of quantities Q ne xt- After a successful step, the events and rules of the model are 
processed. If this causes a change in Q, h is first decreased and the Rosenbrock solver then calculates another vector Q ne xt using this adapted step 
size. The precision of the event processing is therefore determined by the minimum step size h m i n .The adapt function is defined by Rosenbrock's 
method [30]. 
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second step, the derivatives of all species that participate 
in the current reaction Ri need to be updated (see the 
flowchart in Figure 5). 

A reference implementation of the algorithm 

The algorithm described above has been implemented 
in Java™ and included into the Systems Biology Simula- 
tion Core Library. Figure 6 displays an overview of the 
software architecture of this community project, which 
has been designed with the aim to provide an exten- 
sible numerical backend for customized programs for 
research in computational systems biology. The SBML- 
solving algorithm is based on the data structures pro- 
vided by the JSBML project [31]. With the help of 
wrapper classes several numerical solvers originating 
from the Apache Commons Math library [32] could be 
included into the project. In addition, the library pro- 
vides an implementation of the explicit fourth order 
Runge-Kutta method, Rosenbrocks method, and Eulers 
method. 

Due to the strict separation between numerical differ- 
ential equation solvers, and the definition of the actual 
differential equation system, it is possible to implement 
support for other community standards, such as CellML 
[9]. 

In order to support the standard Minimum Information 
About a Simulation Experiment (MIASE) [33], the library 
also provides an interpreter of Simulation Experiment 
Description Markup Language (SED-ML) files [26]. These 
files allow users to store the details of a simulation, includ- 
ing the selection and all settings of the numerical method, 
hence facilitating the creation of reproducible results. A 
simulation experiment can also be directly started by pass- 
ing a SED-ML file to the interpreter in this library. Each 
solver has a method to directly access its correspond- 
ing Kinetic Simulation Algorithm Ontology (KiSAO) term 
[34] to facilitate the execution of SED-ML files. 

Many interfaces, abstract classes, and an exhaustive 
source code documentation in the form of JavaDoc facili- 
tate the customization of the library. For testing purposes, 
the library contains a sample program that benchmarks 
its SBML interpreter against the entire SBML Test Suite 
version 2.3.2 [24]. 

Benchmark and application to published models 

The reference SBML implementation has successfully 
passed the SBML Test Suite [24] using the Rosenbrock 
solver. The results are shown in Table 1. All models 
together can be simulated within seconds, which means 
that the simulation of one SBML model takes only mil- 
liseconds on average, using regular desktop computers. 

The total simulation time for all models in SBML Level 3 
Version 1 is significantly higher than for the models in 
other SBML levels and versions. This can be explained 



Compute all assignment rules: 

Q ~ g(Q,t) 




Compute flux Jj through Rj\ 



i 



for each species i in Rj\ 



j ■= j + i 




Figure 5 Calculation of the derivatives at a specific point in time. 

First, the vector for saving the derivatives of all quantities Q is set to 
the null vector 0. Then the rate rules of the model are processed by 
solving the function g(Q, t), which can change Q in some dimensions. 
After that for every reaction channel Rj its velocity Jj is computed. The 
derivatives of each species (with index /') participating in the currently 
processed reaction channel Rj are updated in each step adding the 
product of the stoichiometry n-,j and the reaction's velocity Jj. In this 
figure, the stoichiometric values n,y in the matrix N are assumed to be 
constant for the sake of simplicity. These values can be variable. 
Before Level 3, SBML provided StoichiometryMath elements 
that could be used for a direct computation of the stoichiometry. In 
Level 3, the StoichiometryMath element has been removed 
and these values can be changed by treating them as the subject of 
assignment rules. In both cases, the values for n,y have to be updated 
in each simulation step. 
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Figure 6 Architecture of the Systems Biology Simulation Core Library (simplified). Numerical methods are strictly separated from differential 
equation systems. The upper part displays the unified type hierarchy of all currently included numerical integration methods. The middle part 
shows the interfaces defining several special types of the differential equations to be solved by the numerical methods. The class 
SBMLinterpreter (bottom part) implements all of these interfaces with respect to the information content of a given SBML model. Similarly, an 
implementation of further data formats can be included into the library. 



by the fact that the test suite contains some models of 
this version whose evaluation requires a time-consuming 
processing of a large number of events. In particular, the 
simulation of model No. 966 of the SBML Test Suite, 
which is only provided in SBML Level 3 Version 1, takes 
20 s because it contains 23 events to be processed. Two 
events fire every 10 -2 time units within the simulation 
time period of 1,000 time units. These events must there- 
fore be evaluated thousandfold within the specified time 
interval. The evaluation of this model accounts for over 
50% of the total simulation time for the models in SBML 
Level 3 Version 1. 

An implementation of an SBML solver that passes 
the test suite should in principle also be capable of 
computing the solution of all models from BioModels 
Database, a resource that contains a collection of pub- 
lished and curated models. This online database cur- 
rently provides neither reference data for the models, 



nor any settings for the numerical computation (such 
as step size, end time etc.). However, it offers pre- 
computed plots of the time courses for the vast major- 
ity of models. Therefore, while it cannot be directly 
used as a benchmark test, it can help checking that 
a solver implementation supports all features of many 
published models and that the algorithm always suc- 
cessfully terminates. The Systems Biology Simulation 
Core Library solves all curated models from BioModels 
Database (release 23, October 2012) without raising any 
errors, see Methods for details. These results suggest the 
reliability of the simulation algorithm described in this 
work. 

In the following, we select two models that exhibit 
diverse features from this repository to illustrate the capa- 
bilities of this library: BioModels Database model No. 206 
by Wolf et al [35] and BioModels Database model No. 390 
by Arnold and Nikoloski [36]. 
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Table 1 Simulation of the models from the SB ML Test Suite using the Rosenbrock solver 


Level 


Version 


Models 


Correct simulations 


Total running time (in s) 


1 


2 


252 


252 


2.9 


2 
2 


1 

2 


885 
1,041 


885 
1,041 


6.8 
6.8 


2 


3 


1,041 


1,041 


6.8 


2 


4 


1,043 


1,043 


6.8 


3 


1 


1,077 


1,077 


38.5 



This table shows the number of tested models and the total running times of the tests for all SBML levels and versions, where the time for reading the file has been 
excluded from the analysis. The measured elapsed time therfore gives the CPU time needed for the computation only (see Methods). 



The model by Wolf et al [35] mimics glycolytic 
oscillations that have been observed in yeast cells. The 
model describes how the dynamics propagate through 
the cellular network comprising eleven reactions, which 
interrelate nine reactive species. Figure 7a displays the 
simulation results for the intracellular concentrations 
of 3-phosphogylcerate, ATP, glucose, glyceraldhyde 3- 
phosphate, and NAD + : after an initial phase of approxi- 
mately 15 s all metabolites begin a steady-going rhythmic 
oscillation. Changes in the dynamics of the fluxes through 
selected reaction channels within this model can be 
seen in Figure7b. 

By comparing a large collection of previous mod- 
els of the Calvin-Benson cycle, Arnold and Nikoloski 
created a quantitative consensus model that comprises 



eleven species, six reactions, and one assignment rule 
[36] . All kinetic equations within this model call special- 
ized function definitions. Figure 8 shows the simulation 
results for the species ribulose-l,5-bisphosphate, ATP, 
and ADP within this model. As in the previous test case, 
the dynamics computed by the Simulation Core Library 
reproduce the figures provided by BioModels Database. 

Comparison to existing solver implementations for SBML 

In order to benchmark our software, we chose similar 
tools exhibiting the following features from the SBML 
software matrix [39]: 
• The last updated version was released after the final 

release of the specification for SBML Level 3 

Version 1 Core, i.e., October 6 th 2010. 
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Figure 7 Simulation of glycolytic oscillations. This figure displays the results of a simulation computed with the Systems Biology Simulation Core 
Library based on model No. 206 from BioModels Database [35,37]. A) Shown are the changes of the concentration of the most characteristic 
intracellular metabolites 3-phosphoglycerate, ATP, glucose, glyceraldehyde 3-phosphate (GA3P), and NAD + within yeast cells in the time interval 
[0,30] seconds. B)This panel displays a selection of the dynamics of relevant fluxes (D-glucose 6-phosphotransferase, glycerone-phosphate- 
forming, phosphoglycerate kinase, pyruvate 2-O-phosphotransferase, acetaldehyde forming, ATP biosynthetic process) that were computed as 
intermediate results by the algorithm. The computation was performed using the Adams-Moulton solver [38] (KiSAO term 280) with 200 integration 
steps, 10 -10 as absolute error tolerance and 10 -5 as relative error tolerance. Due to the importance of feedback regulation the selection of an 
appropriate numerical solver is of crucial importance for this model. Methods without step-size adaptation, such as the fourth order Runge-Kutta 
algorithm (KiSAO term 64), might only be able to find a high-quality solution with an appropriate number of integration steps. The simulation results 
obtained by using the algorithm described in this work reproduces the results provided by BioModels Database. 
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Figure 8 Simulation of the Calvin-Benson cycle. Another example 
of the capabilities of the Simulation Core Library has been obtained 
by solving model No. 390 from BioModels Database [36,37]. This 
figure shows the evolution of the concentrations of ribulose 
1 ,5-bisphosphate, a key metabolite for CO2 fixation in the reaction 
catalyzed by ribulose-1 ,5-bisphosphate carboxylase oxygenase 
(RuBisCO), and the currency metabolites ATP and ADP during the first 
35 s of the photosynthesis. This model was simulated using Euler's 
method (KiSAO term 30) with 200 integration steps. 



• Support for SBML Level 3. 

• Open-source software 

• No dependency on commercial products that are not 
freely available (e.g., MATLAB™ or Mathematica™) 

The selected programs are in alphabetical order: BioUML 
[40], COPASI [41], iBioSim [42], JSim [43], LibSBMLSim 
[44], and VCell [45,46]. Table 2 summarizes the compar- 
ison of the most recent versions of all six programs. It 
should be mentioned that this comparison can only mir- 
ror a snapshot of the ongoing development process of 
all programs at the time of writing. An up-to-date com- 
parison of the capabilities of SBML solvers can be found 
online [47]. 

Limitations and perspective 

The modifications done to the Rosenbrock solver enable a 
precise timing of events during simulation. However, this 
precise timing can lead to a noticeable increase in run- 
time when events are triggered in very small intervals, 
e.g., every 10 -3 time units. This behavior can, for exam- 
ple, be observed in BioModels Database model No. 408 
[48] (a model with three events). When the precise tim- 
ing of events is not of utmost importance, a solver other 
than Rosenbrock can be chosen. Furthermore, there are 
plans to improve the runtime behavior of the Rosen- 
brock solver for the simulation of models containing 
events. 

When dealing with stiff problems, Rosenbrocks method 
is a good choice, because it is has been designed for 



stiff pODE. However, our experiments show, that the 
Rosenbrock solver can be inefficient for non-stiff prob- 
lems in comparison to other solvers. This issue can lead 
to an increased run-time regarding large models such 
as model No. 235 of the BioModels Database, which 
contains 622 species that participate in 778 reactions, 
distributed accross three compartments [49]. In some 
cases, tuning the relative and absolute tolerance can help, 
but depending on the systems structure, Rosenbrocks 
method is sometimes stretched to its limits. The Runge- 
Kutta-Fehlberg method [50] (KiSAO term 86), which is 
included in iBioSim, shows also an increase in run-time 
concerning this model. 

The performance of the Runge-Kutta-Fehlberg and 
Rosenbrock methods show, however, that simpler ODE 
solvers can have more difficulties with some biological 
models than more advanced solvers, such as CVODE from 
SUNDIALS [51] that can adapt to both non-stiff and 
stiff problems. The SUNDIALS library, which is incor- 
porated into BioUML, can handle complicated pODE 
significantly better, but since it is not available under 
the LGPL and no open-source Java version of these 
solvers can currently be obtained, we disregarded its 
use. 

Algebraic rules constitute an important problem for 
any implementation of the SBML standard. The unbound 
variable of each such equation can be efficiently identi- 
fied [29], whereas the transformation of an algebraic rule 
into an assignment rule includes symbolic computation 
and is very difficult to implement. In some cases, such 
a transformation is not even possible. Alternatively, the 
current value of the free variable in an algebraic equation 
could, for instance, be identified using nested intervals. 
However, this approach consumes a significantly higher 
run-time, because the nested intervals would have to be 
re-computed at every time step, whereas the transforma- 
tion approach considers every algebraic rule only once 
(during the initialization). 

Since Level 3, SBML entails one further aspect: it is 
now possible to add additional features to the model by 
declaring specialized extension packages. The algorithm 
discussed in this paper describes the core functionality of 
SBML. The extension packages are very diverse, reaching 
the graphical representation [53], the description of qual- 
itative networks, such as Petri nets [54], and many more. 
It is therefore necessary to separately derive and imple- 
ment algorithms for the interpretation of individual SBML 
packages. 

The agenda for the further development of the open- 
source project, the Systems Biology Simulation Core 
Library, includes the implementation of SBML exten- 
sion packages, support for CellML, and the incorporation 
of additional numerical solvers. Contributions from the 
community are welcome. 



Table 2 Comparison of SBML-capable simulators 



Program 


Version 


Fast 
reactions 


Difficult SBML 
elements 
Algebraic 
rules 


Events 


Fully SBML test 
Suite compliant 


SED-ML 


Programming 
language 


GUI 


API 

access 


Platform 


Comments 


BioUML 


0.9.4 


/ 


/ 


/ 


/ 


In a 

version 


Java 


/ 


JavaScript 


Independent 




COPASI 


4.9.45 






(/) 






C++ (with 
multiple bindings) 


/ 


/ 


Windows, Mac OS X, 
Linux, Solaris 




iBioSim 


2.4.5 


/ 


/ 


/ 


/ 


In a 

version 


Java, C 


/ 


(/) 


Windows, Mac OS X, 
Linux (Fedora 1 7) 




JSim 


2.10 




/ 








Java 


/ 


/ 


Windows, Mac OS X, 
Linux 




LibSBMLSim 


1.1.0 


/ 


/ 


/ 


(/) 




C (with multiple ) 
bindings) 




/ 


Windows, Mac OS X, 
Linux, Free BSD 




Simulation 


1.3 


/ 


/ 


/ 


/ 


/ 


Java 




/ 


Independent 




core library 
























VCell 


5.0 


/ 




/ 






Java frontend, 
C/C++ server 
backend 


/ 




Independent 


Internet 

connection 

required 



The table gives an overview about the most characteristic features of SBML-capable simulation programs (April 1 9 th 201 3). It shows which programs support the SBML elements fast reactions, algebraic rules, and events. 
Another key point is whether all models of the most recent SBML Test Suite [24] can be correctly solved. Note that in the SBML Test Suite column, a dash means that not all of its models can be correctly solved, because not all 
SBML elements are supported. LibSBMLSim, which is a simulation API written in C, can only read models given in SBML Level 2 Version 4 and SBML Level 3 (indicated by the checkmark in brackets). Similarly, a dash in the 
column for events means that not all possible cases for this language element can be correctly solved. COPASI, for instance, supports events in SBML, but not all of the current constructs. It should be mentioned that not all 
programs primarily focus on the use of ODE solvers. In iBioSim, for instance, the stochastic analysis of SBML is more important [52]. Furthermore, some programs such as VCell or COPASI do not use SBML as their native 
format. BioUML, iBioSim, and the Systems Biology Simulation Core Library, are the only simulation tools from this selection that pass all models of the SBML Test Suite across all levels and versions of SBML. Most programs 
provide direct access to their API. COPASI, LibSBMLSim, and the Systems Biology Simulation Core Library have particularly been designed for the use as a solver backend. The program iBioSim can be executed in a script, e.g., 
for batch processing of multiple models. 
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Conclusions 

The aim of this work is to derive a formal descrip- 
tion of the mathematics behind SBML together with 
an algorithm that efficiently solves it in terms of an 
ordinary differential equation framework. As an impor- 
tant design feature, the algorithm can be combined with 
existing numerical solvers in a plugin fashion. The Rosen- 
brock solver embodies a universal approach for simu- 
lation that can deal with stiff problems and precisely 
solve models containing arbitrary SBML elements. The 
description in this paper is intended to facilitate the imple- 
mentation of the algorithm within specifically tailored 
programs. 

Our tests indicate that at the moment only two other 
programs pass the entire test suite for all SBML levels 
and versions: BioUML, which is a workbench for mod- 
elling, simulation, and parameter fitting, and iBioSim. The 
reference implementation of the algorithm introduced in 
this work, the Systems Biology Simulation Core Library, 
is therefore the only API simulation library exhibiting this 
capability. 

The Systems Biology Simulation Core Library is an effi- 
cient Java tool for the simulation of differential equation 
systems used in systems biology. It can be easily inte- 
grated into larger customized applications. For instance, 
CellDesigner [55] has already been using it since ver- 
sion 4.2 as one of its integral simulation libraries. The 
stand-alone application SBMLsimulator [56] provides a 
convenient graphical user interface for the simulation of 
SBML models and uses it as a computational back-end. 
The abstract class structure of the library supports the 
integration of further model formats, such as CellML, 
in addition to its SBML implementation. To this end, 
it is only necessary to implement a suitable interpreter 
class. 

By including support for the emerging standard SED- 
ML, we hope to facilitate the exchange, archival and repro- 
duction of simulation experiments performed using the 
Systems Biology Simulation Core Library. 

Methods 

Implementation 

All the solver classes are derived from the abstract class 
AbstractDESSolver (Figure 6). Several solvers of the 
Apache Commons Math library (version 3.0) are inte- 
grated with the help of wrapper classes [32]. Numer- 
ical methods and the actual differential equation sys- 
tems are strictly separated. The class Mult i Table 
stores the results of a simulation within its Block data 
structures. 

The abstract description of differential equation sys- 
tems, with the help of several distinct interfaces, makes it 
possible to decouple them from a particular type of bio- 
logical network. It is therefore possible to pass an instance 



of an interpreter for a respective model description for- 
mat to any available solver. The interpretation of SBML 
models is split between evaluation of events and rules, 
computation of stoichiometric information, and computa- 
tion of the current values for all model components (such 
as species and compartments). 

For a given state of the ODE system, the class 
SBML interpreter, responsible for the evaluation of 
models encoded in SBML, returns the current set of 
time-derivatives of the variables. It is connected to an effi- 
cient MathML interpreter of the expressions contained in 
kinetic laws, rules and events (ASTNode Interpreter). 
The nodes of the syntax graph for those expressions 
depend on the current state of the ODE system. If the 
state has changed, the values of the nodes have to be 
recalculated (see Results). 

An important aspect in the interpretation of SBML 
models is the determination of the exact time at which 
an event occurs because this influences the precision 
of the systems variables. To this end, we adjusted 
an implementation of the Rosenbrock solver [57], an 
integrator with an adaptive step size, to a very pre- 
cise timing of the events. In addition to events, rules 
are also treated during integration. Basically, rules are 
treated like events that occur at every given point in 
time and are therefore processed in the same man- 
ner. For every object of the type AlgebraicRule, a 
new Assignment Rule object is generated by means 
of the preceding bipartite matching. They represent 
only temporary rules, that are incorporated in the sim- 
ulation process but do not influence the model in 
the SBML file. 

In the SBMLinterpreter, events are represented via 
an array containing one instance of Event InPr ogress 
for every event in the model. Thereby, the distinction 
between events with and without delays is made. Both 
types of events can be triggered multiple times before 
being executed. If no delay is defined, the assignments of 
the event are usually executed at the same point in time 
when the event has been triggered. However, when such 
an event is cancelled by other events, all of its assign- 
ments are also cancelled before execution. An event with 
delay can produce multiple further assignments within 
the time frame between the trigger time and the actual 
execution time. In order to deal with delayed events, the 
class SBMLEventlnProgressWithDelay keeps track 
of this via a list containing the points in time, at which 
the respective event has to be executed. When events 
are triggered more than once before execution, they have 
to be sorted in ascending order by their delay. This is 
neccesary, because in this case the delay of the very same 
event may vary. 

When the SBMLinterpreter is processing events 
with priority, the events with the highest priority are 
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stored in a list until one of them is selected for execution. 
Technically, the method of choice for the organization 
of such priority queues would apply a binary max heap 
data structure instead. The root of the heap represents 
the largest value in the heap. After its extraction, the 
heap property is restored so that the next largest value 
is moved to the root. However, as stated in Results, 
the execution of one event can influence the priority of 
the remaining events. It can possibly happen that many 
priorities simultaneously change, whereby the standard 
method to restore the max heap characteristic after 
extraction is not sufficient anymore. For this reason, we 
disregarded the use of more complex data structures for 
the current implementation. 

Since SBML Level 2 Version 1, it has also become 
possible to create user-defined functions. These function 
definition objects contain lambda calculus including an 
optional list of arguments together with the actual math- 
ematical expression of the function. During the initializa- 
tion phase, function definitions are also incorporated into 
the abstract syntax graph (Figure 1). For each function 
definition, its arguments defined in its lambda expression 
are mapped to their corresponding nodes in the abstract 
syntax graph. The evaluation of a syntax graph node 
with a user-defined function consists of several steps. The 
arguments are evaluated and then passed to their corre- 
sponding node in the graph via the mapping established 
before. After this step the nodes representing arguments 
have a specific value attached to them. Finally, the com- 
plete abstract syntax graph can be evaluated. Care must 
be taken, because several function definitions may have 
arguments with identical identifiers. All possible naming 
conflicts must be preempted. 

As part of the calculation of reaction velocities, 
the StoichiometryMath construct allows a dynamic 
change of a reactions stoichiometry over the course 
of the simulation. Since SBML Level 3 Version 1, the 
stoichiometry of a reaction can be directly altered, 
because it is now possible to address the identifier of a 
SpeciesRef erence as the target variable within rules 
or events. The SBMLinterpreter class flags reactions 
with changing stoichiometry during initialization and 
evaluates the corresponding abstract syntax graph anew if 
the stoichiometry is needed for calculation. 

The constraints introduce assumptions about a 
models behavior. Similar to the trigger of events, 
the abstract syntax graph of each constraint is eval- 
uated at every time step. In case of a violation 
the SBMLinterpreter generates an instance of 
ConstraintEvent that is then processed by the cor- 
responding ConstraintListener class. The user is 
informed about the constraint upon its violation via the 
standard Java Logger. The output message includes 
the point in simulation time and the message of the 



constraint. In addition, more advanced user-defined 
implementations of ConstraintListener can be 
added to the SBMLinterpreter, for instance, to notify 
a GUI about violations or display the associate messages 
in a more user-friendly way. 

SED-ML support is enabled by inclusion of the 
jlibsedml library [58] in the binary download. 
Clients of the Systems Biology Simulation Core Library 
can choose to use the jlibsedml API directly, or 
access SED-ML support via facade classes in the 
org . simulator . sedml package that do not require 
direct dependencies on j libsedml in their code. 

Default settings and configuration 

The standard preferences for simulating an SBML model 
consist of the Rosenbrock solver with an absolute toler- 
ance of 10 -12 and a relative tolerance of 10 -6 . On the 
basis of our experiments, this setup can handle most of the 
problems without further tuning. The Rosenbrock solver 
with its adaptive step size is the most effective solver 
in this library for stiff pODE. Nevertheless, the user has 
the possibility the choose another solver for integration. 
According to the SBML specifications, a model has to be 
simulated starting at time point 0.0. Since this library is 
not limited to SBML, the solvers also accept arbitrary start 
times. The user has also the possibility to specify the end 
of the simulation. Modifying the relative and absolute tol- 
erance can increase the accuracy of the results or decrease 
computation time. 

Simulation of models from BioModels Database 

All 424 curated models from BioModels Database [11] 
(release 23, October 2012) have been simulated with iden- 
tical settings, as suggested by Bergmann et al. [25]: time 
interval [0,10], the Rosenbrock solver, 10 -6 as relative 
and 10 -12 as absolute tolerance, and a step size of 0.01 
time units. For the models No. 234 [59] and No. 339 [60] 
from BioModels Database the absolute tolerance had to 
be set to 10 -10 in order to achieve the necessary accu- 
racy and to avoid that the algorithm surpasses its minimal 
step size. On a sample basis, individual models have been 
selected and manually compared to the pre-computed 
plots provided by BioModels Database in order to check 
the correctness of the simulation results. 

Simulation of the SBML test suite 

The models from SBML Test Suite version 2.3.2 [24] were 
first simulated with the Rosenbrock solver, 10 -6 as relative 
and 10 -12 as absolute tolerance. For six models (No. 863, 
882, 893, 994, 1109, and 1121) we had to set the rela- 
tive tolerance to 10 -8 in order to simulate as accurately 
as desired. For three other models (No. 872, 987, 1052) 
the relative tolerance even had to be set to 10 -12 and the 
absolute tolerance to 10 -14 . 



Keller etal. BMC Systems Biology 201 3, 7:55 
http://www.biomedcentral.eom/1752-0509/7/55 



Page 14 of 16 



Hard- and software configuration 

For all run-time tests, an Intel® Core™ i5 CPU with 
3.33 GHz and 4 GB RAM was used with Microsoft® 
Windows® 7 (Version 6.1.7600) as operating system and 
Java Virtual Machine version 1.6.0_25. 

The Systems Biology Simulation Core Library was also 
successfully tested under Linux (Ubuntu version 10.4) and 
Mac OS X (versions 10.6.8 and 10.8.2). 

Availability and requirements 

The current version of Systems Biology Simulation Core 
Library is available at the projects homepage. The entire 
project, including source code and documentation, sev- 
eral versions of jar files containing only binaries, binaries 
together with source code, can be downloaded, option- 
ally also as a version including all required third-party 
libraries. 

Project name: Systems Biology Simulation Core Library 
Project homepage: http://simulation-core.sourceforge. 
net 

Operating systems: Platform independent, i.e., for all sys- 
tems for which a JVM is available. 
Programming language: Java™ 

Other requirements: Java Runtime Environment (JRE) 
1.6 or above 

License: GNU Lesser General Public License (LGPL) ver- 
sion 3 

Endnote 

a More than 250 available programs now support the 
SBML data format (April 19 th 2013). 
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